Goto

Collaborating Authors

 quantitative data


FairFare: A Tool for Crowdsourcing Rideshare Data to Empower Labor Organizers

Calacci, Dana, Rao, Varun Nagaraj, Dalal, Samantha, Di, Catherine, Pua, Kok-Wei, Schwartz, Andrew, Spitzberg, Danny, Monroy-Hernández, Andrés

arXiv.org Artificial Intelligence

In recent years, labor organizers representing rideshare and delivery workers have advocated for regulations to improve working conditions in the rideshare industry that set wage floors and job loss protections [67]. To call for these improvements, organizers need to understand workers' existing conditions [37], a significant data access and social computing challenge in the rideshare industry. Labor organizers representing rideshare workers typically rely on a collage of qualitative anecdotes and screenshots to provide data about existing working conditions [24]. While these qualitative data provide rich, "thick descriptions" [30] of workers' experience, they are often dismissed by platforms as non-representative, cherry-picked examples. Rideshare platforms, on the other hand, have exclusive access to large-scale, comprehensive quantitative datasets of driver, trip, and pay data that they can draw upon to create authoritative narratives about working conditions in their industry [72]. Labor organizers need comprehensive access to large-scale quantitative data describing working conditions to conduct rigorous, independent investigations and contest platform-driven narratives. There are tools and legal frameworks that empower individual rideshare workers to independently access quantitative work data (e.g., Gridwise and Data Subject Access Requests). However, these tools and frameworks do not provide an intuitive way to aggregate individual worker data into a dataset that provides collective insight into overarching working conditions. Algorithmic auditing scholarship provides methods, like crowdsourcing data, to independently investigate black-boxed systems [66].


QuaLLM-Health: An Adaptation of an LLM-Based Framework for Quantitative Data Extraction from Online Health Discussions

Kouzy, Ramez, Attar-Olyaee, Roxanna, Rooney, Michael K., Hassanzadeh, Comron J., Li, Junyi Jessy, Mohamad, Osama

arXiv.org Artificial Intelligence

Health-related discussions on social media like Reddit offer valuable insights, but extracting quantitative data from unstructured text is challenging. In this work, we present an adapted framework from QuaLLM into QuaLLM-Health for extracting clinically relevant quantitative data from Reddit discussions about glucagon-like peptide-1 (GLP-1) receptor agonists using large language models (LLMs). We collected 410k posts and comments from five GLP-1-related communities using the Reddit API in July 2024. After filtering for cancer-related discussions, 2,059 unique entries remained. We developed annotation guidelines to manually extract variables such as cancer survivorship, family cancer history, cancer types mentioned, risk perceptions, and discussions with physicians. Two domain-experts independently annotated a random sample of 100 entries to create a gold-standard dataset. We then employed iterative prompt engineering with OpenAI's "GPT-4o-mini" on the gold-standard dataset to build an optimized pipeline that allowed us to extract variables from the large dataset. The optimized LLM achieved accuracies above 0.85 for all variables, with precision, recall and F1 score macro averaged > 0.90, indicating balanced performance. Stability testing showed a 95% match rate across runs, confirming consistency. Applying the framework to the full dataset enabled efficient extraction of variables necessary for downstream analysis, costing under $3 and completing in approximately one hour. QuaLLM-Health demonstrates that LLMs can effectively and efficiently extract clinically relevant quantitative data from unstructured social media content. Incorporating human expertise and iterative prompt refinement ensures accuracy and reliability. This methodology can be adapted for large-scale analysis of patient-generated data across various health domains, facilitating valuable insights for healthcare research.


Framework for developing quantitative agent based models based on qualitative expert knowledge: an organised crime use-case

Oetker, Frederike, Nespeca, Vittorio, Vis, Thijs, Duijn, Paul, Sloot, Peter, Quax, Rick

arXiv.org Artificial Intelligence

In order to model criminal networks for law enforcement purposes, a limited supply of data needs to be translated into validated agent-based models. What is missing in current criminological modelling is a systematic and transparent framework for modelers and domain experts that establishes a modelling procedure for computational criminal modelling that includes translating qualitative data into quantitative rules. For this, we propose FREIDA (Framework for Expert-Informed Data-driven Agent-based models). Throughout the paper, the criminal cocaine replacement model (CCRM) will be used as an example case to demonstrate the FREIDA methodology. For the CCRM, a criminal cocaine network in the Netherlands is being modelled where the kingpin node is being removed, the goal being for the remaining agents to reorganize after the disruption and return the network into a stable state. Qualitative data sources such as case files, literature and interviews are translated into empirical laws, and combined with the quantitative sources such as databases form the three dimensions (environment, agents, behaviour) of a networked ABM. Four case files are being modelled and scored both for training as well as for validation scores to transition to the computational model and application phase respectively. In the last phase, iterative sensitivity analysis, uncertainty quantification and scenario testing eventually lead to a robust model that can help law enforcement plan their intervention strategies. Results indicate the need for flexible parameters as well as additional case file simulations to be performed.


Natural Language Processing: The Technology That's Biased

#artificialintelligence

Natural Language Processing (NLP) refers to building machines that can understand and respond to voice data with their own text and speech. Natural Language Processing falls under the umbrella of Artificial Intelligence (AI) and recent models like the Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-Trained Transformer 3 (GPT-3) and Pathways AI Language Models (PaLM) have made accurate human-machine communication possible. These Large language Models (LLMs) are trained on massive volumes of text with billions of parameters and are able to understand and answer reading comprehension questions as well as generating new text such as a summary. Put simply, LLMs are trained to predict the next words in a sentence, such as by extending the autocomplete feature in messaging applications. But they can do much more, for example question answering, translation, image captioning, human-level dialogue agents, entity linking, or even data cleaning (for mixes of structured and unstructured data). NLP is already being used to automate some human tasks (RPA – robotic process automation), however the breath-taking advances in the last 3 years, NLP open new potential for businesses to digitize company knowledge and disrupting incumbent business models.


Comparing and extending the use of defeasible argumentation with quantitative data in real-world contexts

Rizzo, Lucas, Longo, Luca

arXiv.org Artificial Intelligence

Dealing with uncertain, contradicting, and ambiguous information is still a central issue in Artificial Intelligence (AI). As a result, many formalisms have been proposed or adapted so as to consider non-monotonicity, with only a limited number of works and researchers performing any sort of comparison among them. A non-monotonic formalism is one that allows the retraction of previous conclusions or claims, from premises, in light of new evidence, offering some desirable flexibility when dealing with uncertainty. This research article focuses on evaluating the inferential capacity of defeasible argumentation, a formalism particularly envisioned for modelling non-monotonic reasoning. In addition to this, fuzzy reasoning and expert systems, extended for handling non-monotonicity of reasoning, are selected and employed as baselines, due to their vast and accepted use within the AI community. Computational trust was selected as the domain of application of such models. Trust is an ill-defined construct, hence, reasoning applied to the inference of trust can be seen as non-monotonic. Inference models were designed to assign trust scalars to editors of the Wikipedia project. In particular, argument-based models demonstrated more robustness than those built upon the baselines despite the knowledge bases or datasets employed. This study contributes to the body of knowledge through the exploitation of defeasible argumentation and its comparison to similar approaches. The practical use of such approaches coupled with a modular design that facilitates similar experiments was exemplified and their respective implementations made publicly available on GitHub [120, 121]. This work adds to previous works, empirically enhancing the generalisability of defeasible argumentation as a compelling approach to reason with quantitative data and uncertain knowledge.


Complete Beginner's Guide to Analytics

#artificialintelligence

There's no one magic way to create an experience that will be universally and automatically loved. That's not the goal--rather, we seek to create experiences that will intuitively work for and delight a specific target audience. That's where analytics comes in. If you can't measure it, how will you know if it was successful? This is the question that drives UX practitioners to collect and analyze data, while protecting it with management services like the ones at https://www.couchbase.com/pricing.


Qualitative Data Can Provide Context and Meaning to Your Quantitative Data

#artificialintelligence

Someone once said "if you can't measure something, you can't understand it." Another version of this belief says: "If you can't measure it, it doesn't exist." This is a false way of thinking -- a fallacy -- in fact it is sometimes called the McNamara fallacy. This mindset can have dire consequences in national affairs as well as in personal medical treatment (such as the application of "progression-free survival" metrics in cancer patients, where the reduction in tumors is lauded as a victory while the corresponding reduction in quality of life is ignored). Similarly, in the world of data science and analytics, we are often drawn into this same way of thinking.


A Brief Introduction to the Concept of Data - KDnuggets

#artificialintelligence

Bio: Angelica Lo Duca (Medium) works as post-doc at the Institute of Informatics and Telematics of the National Research Council (IIT-CNR) in Pisa, Italy. She is Professor of "Data Journalism" for the Master degree course in Digital Humanities at the University of Pisa. Her research interests include Data Science, Data Analysis, Text Analysis, Open Data, Web Applications and Data Journalism, applied to the fields of society, tourism and cultural heritage. She used to work on Data Security, Semantic Web and Linked Data. Angelica is also an enthusiastic tech writer.


How to Improve UX with AI and Machine Learning - Unthinkable

#artificialintelligence

With the rapidly changing face of technology, AI has indeed reshaped the digital world. AI has created a positive impact on diverse sectors like finance, healthcare, retail, and more. But before we delve into how AI and ML are improving UX, let's have a look at what exactly does UX mean. User Experience (UX) encompasses all the aspects of the end user's interaction with the company, its services, products, and overall customer journey. The most crucial requirement for a great UX is meeting the exact customer needs and understanding their behavioral patterns.


Data Types in Statistics Used for Machine Learning.

#artificialintelligence

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics allows you to understand a subject much more deeply. To become a successful Data Scientist you must know our basics. Math and Stats are the building blocks of Machine Learning algorithms.